Investigating speech style specific pronunciation variation in large spoken language corpora
نویسندگان
چکیده
In the past, linguistic research was typically conducted on relatively small datasets that were specifically designed for the research at hand. Whereas to date many large spoken language corpora have become available, the usefulness of these corpora is still not fully established in linguistic research. The research reported on in this paper was conducted to illustrate the potential of large multi-purpose spoken language corpora for linguistic research. The possibility was investigated of identifying phonetic regularities in different speech styles. To this end, a datadriven study was conducted with a large multi-purpose spoken language corpus comprising a manually corrected broad phonetic transcription of the data. Our results show that speech style specific pronunciation processes can indeed be found in such a large corpus. This indicates that large multipurpose spoken language corpora can contribute to linguistic research, if only for the purpose of hypothesis generation and verification.
منابع مشابه
On the Usefulness of Large Spoken Language Corpora for Linguistic Research
In the past, fundamental linguistic research was typically conducted on small data sets that were handcrafted for the specific research at hand. However, from the eighties onwards, many large spoken language corpora have become available. This study investigates the usefulness of large multi-purpose spoken language corpora for fundamental linguistic research. A research task was designed in whi...
متن کاملAutomatic phonetic transcription of large speech corpora
This study is aimed at investigating whether automatic phonetic transcription procedures can approximate manual transcriptions typically delivered with contemporary large speech corpora. To this end, ten automatic procedures were used to generate a broad phonetic transcription of well-prepared speech (read-aloud texts) and spontaneous speech (telephone dialogues) from the Spoken Dutch Corpus. T...
متن کاملGender in everyday speech and language: a corpus-based study
This paper presents an exploratory study on the relations between gender and everyday parlance. A “data-mining” approach is used to explore gender-specific characteristics in a large number of spontaneous telephone and face-to-face conversations. Our study focuses on speech rate (speaking rate and articulation rate), disfluencies (filled pauses and repetitions), pronunciation variation (phoneme...
متن کاملAnalyzing and identifying multiword expressions in spoken language
The present paper investigates multiword expressions (MWEs) in spo ken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs ...
متن کاملMultiword expressions in spoken language: An exploratory study on pronunciation variation
The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004